FIFA 19 ANALYSIS

About the dataset

data.csv includes lastest edition FIFA 2019 players attributes like Age, Nationality, Overall, Potential, Club, Value, Wage, Preferred Foot, International Reputation, Weak Foot, Skill Moves, Work Rate, Position, Jersey Number, Joined, Loaned From, Contract Valid Until, Height, Weight, LS, ST, RS, LW, LF, CF, RF, RW, LAM, CAM, RAM, LM, LCM, CM, RCM, RM, LWB, LDM, CDM, RDM, RWB, LB, LCB, CB, RCB, RB, Crossing, Finishing, Heading, Accuracy, ShortPassing, Volleys, Dribbling, Curve, FKAccuracy, LongPassing, BallControl, Acceleration, SprintSpeed, Agility, Reactions, Balance, ShotPower, Jumping, Stamina, Strength, LongShots, Aggression, Interceptions, Positioning, Vision, Penalties, Composure, Marking, StandingTackle, SlidingTackle, GKDiving, GKHandling, GKKicking, GKPositioning, GKReflexes, and Release Clause. SOURCE: https://www.kaggle.com/karangadiya/fifa19

In [1]:
#importing required libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
%matplotlib inline
In [2]:
pd.set_option('display.max_columns', 100) #--> IN ORDER TO DISPLAY ALL COLUMNS IN THE DATASET.
In [3]:
df = pd.read_csv(r"E:\Data_Science_Journey\Course Notes\FWD\Data Visualization\archive\data.csv", index_col= 0)
df
Out[3]:
ID Name Age Photo Nationality Flag Overall Potential Club Club Logo Value Wage Special Preferred Foot International Reputation Weak Foot Skill Moves Work Rate Body Type Real Face Position Jersey Number Joined Loaned From Contract Valid Until Height Weight LS ST RS LW LF CF RF RW LAM CAM RAM LM LCM CM RCM RM LWB LDM CDM RDM RWB LB LCB CB RCB RB Crossing Finishing HeadingAccuracy ShortPassing Volleys Dribbling Curve FKAccuracy LongPassing BallControl Acceleration SprintSpeed Agility Reactions Balance ShotPower Jumping Stamina Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure Marking StandingTackle SlidingTackle GKDiving GKHandling GKKicking GKPositioning GKReflexes Release Clause
0 158023 L. Messi 31 https://cdn.sofifa.org/players/4/19/158023.png Argentina https://cdn.sofifa.org/flags/52.png 94 94 FC Barcelona https://cdn.sofifa.org/teams/2/light/241.png €110.5M €565K 2202 Left 5.0 4.0 4.0 Medium/ Medium Messi Yes RF 10.0 Jul 1, 2004 NaN 2021 5'7 159lbs 88+2 88+2 88+2 92+2 93+2 93+2 93+2 92+2 93+2 93+2 93+2 91+2 84+2 84+2 84+2 91+2 64+2 61+2 61+2 61+2 64+2 59+2 47+2 47+2 47+2 59+2 84.0 95.0 70.0 90.0 86.0 97.0 93.0 94.0 87.0 96.0 91.0 86.0 91.0 95.0 95.0 85.0 68.0 72.0 59.0 94.0 48.0 22.0 94.0 94.0 75.0 96.0 33.0 28.0 26.0 6.0 11.0 15.0 14.0 8.0 €226.5M
1 20801 Cristiano Ronaldo 33 https://cdn.sofifa.org/players/4/19/20801.png Portugal https://cdn.sofifa.org/flags/38.png 94 94 Juventus https://cdn.sofifa.org/teams/2/light/45.png €77M €405K 2228 Right 5.0 4.0 5.0 High/ Low C. Ronaldo Yes ST 7.0 Jul 10, 2018 NaN 2022 6'2 183lbs 91+3 91+3 91+3 89+3 90+3 90+3 90+3 89+3 88+3 88+3 88+3 88+3 81+3 81+3 81+3 88+3 65+3 61+3 61+3 61+3 65+3 61+3 53+3 53+3 53+3 61+3 84.0 94.0 89.0 81.0 87.0 88.0 81.0 76.0 77.0 94.0 89.0 91.0 87.0 96.0 70.0 95.0 95.0 88.0 79.0 93.0 63.0 29.0 95.0 82.0 85.0 95.0 28.0 31.0 23.0 7.0 11.0 15.0 14.0 11.0 €127.1M
2 190871 Neymar Jr 26 https://cdn.sofifa.org/players/4/19/190871.png Brazil https://cdn.sofifa.org/flags/54.png 92 93 Paris Saint-Germain https://cdn.sofifa.org/teams/2/light/73.png €118.5M €290K 2143 Right 5.0 5.0 5.0 High/ Medium Neymar Yes LW 10.0 Aug 3, 2017 NaN 2022 5'9 150lbs 84+3 84+3 84+3 89+3 89+3 89+3 89+3 89+3 89+3 89+3 89+3 88+3 81+3 81+3 81+3 88+3 65+3 60+3 60+3 60+3 65+3 60+3 47+3 47+3 47+3 60+3 79.0 87.0 62.0 84.0 84.0 96.0 88.0 87.0 78.0 95.0 94.0 90.0 96.0 94.0 84.0 80.0 61.0 81.0 49.0 82.0 56.0 36.0 89.0 87.0 81.0 94.0 27.0 24.0 33.0 9.0 9.0 15.0 15.0 11.0 €228.1M
3 193080 De Gea 27 https://cdn.sofifa.org/players/4/19/193080.png Spain https://cdn.sofifa.org/flags/45.png 91 93 Manchester United https://cdn.sofifa.org/teams/2/light/11.png €72M €260K 1471 Right 4.0 3.0 1.0 Medium/ Medium Lean Yes GK 1.0 Jul 1, 2011 NaN 2020 6'4 168lbs NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 17.0 13.0 21.0 50.0 13.0 18.0 21.0 19.0 51.0 42.0 57.0 58.0 60.0 90.0 43.0 31.0 67.0 43.0 64.0 12.0 38.0 30.0 12.0 68.0 40.0 68.0 15.0 21.0 13.0 90.0 85.0 87.0 88.0 94.0 €138.6M
4 192985 K. De Bruyne 27 https://cdn.sofifa.org/players/4/19/192985.png Belgium https://cdn.sofifa.org/flags/7.png 91 92 Manchester City https://cdn.sofifa.org/teams/2/light/10.png €102M €355K 2281 Right 4.0 5.0 4.0 High/ High Normal Yes RCM 7.0 Aug 30, 2015 NaN 2023 5'11 154lbs 82+3 82+3 82+3 87+3 87+3 87+3 87+3 87+3 88+3 88+3 88+3 88+3 87+3 87+3 87+3 88+3 77+3 77+3 77+3 77+3 77+3 73+3 66+3 66+3 66+3 73+3 93.0 82.0 55.0 92.0 82.0 86.0 85.0 83.0 91.0 91.0 78.0 76.0 79.0 91.0 77.0 91.0 63.0 90.0 75.0 91.0 76.0 61.0 87.0 94.0 79.0 88.0 68.0 58.0 51.0 15.0 13.0 5.0 10.0 13.0 €196.4M
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
18202 238813 J. Lundstram 19 https://cdn.sofifa.org/players/4/19/238813.png England https://cdn.sofifa.org/flags/14.png 47 65 Crewe Alexandra https://cdn.sofifa.org/teams/2/light/121.png €60K €1K 1307 Right 1.0 2.0 2.0 Medium/ Medium Lean No CM 22.0 May 3, 2017 NaN 2019 5'9 134lbs 42+2 42+2 42+2 44+2 44+2 44+2 44+2 44+2 45+2 45+2 45+2 44+2 45+2 45+2 45+2 44+2 44+2 45+2 45+2 45+2 44+2 45+2 45+2 45+2 45+2 45+2 34.0 38.0 40.0 49.0 25.0 42.0 30.0 34.0 45.0 43.0 54.0 57.0 60.0 49.0 76.0 43.0 55.0 40.0 47.0 38.0 46.0 46.0 39.0 52.0 43.0 45.0 40.0 48.0 47.0 10.0 13.0 7.0 8.0 9.0 €143K
18203 243165 N. Christoffersson 19 https://cdn.sofifa.org/players/4/19/243165.png Sweden https://cdn.sofifa.org/flags/46.png 47 63 Trelleborgs FF https://cdn.sofifa.org/teams/2/light/703.png €60K €1K 1098 Right 1.0 2.0 2.0 Medium/ Medium Normal No ST 21.0 Mar 19, 2018 NaN 2020 6'3 170lbs 45+2 45+2 45+2 39+2 42+2 42+2 42+2 39+2 40+2 40+2 40+2 38+2 35+2 35+2 35+2 38+2 30+2 31+2 31+2 31+2 30+2 29+2 32+2 32+2 32+2 29+2 23.0 52.0 52.0 43.0 36.0 39.0 32.0 20.0 25.0 40.0 41.0 39.0 38.0 40.0 52.0 41.0 47.0 43.0 67.0 42.0 47.0 16.0 46.0 33.0 43.0 42.0 22.0 15.0 19.0 10.0 9.0 9.0 5.0 12.0 €113K
18204 241638 B. Worman 16 https://cdn.sofifa.org/players/4/19/241638.png England https://cdn.sofifa.org/flags/14.png 47 67 Cambridge United https://cdn.sofifa.org/teams/2/light/1944.png €60K €1K 1189 Right 1.0 3.0 2.0 Medium/ Medium Normal No ST 33.0 Jul 1, 2017 NaN 2021 5'8 148lbs 45+2 45+2 45+2 45+2 46+2 46+2 46+2 45+2 44+2 44+2 44+2 44+2 38+2 38+2 38+2 44+2 34+2 30+2 30+2 30+2 34+2 33+2 28+2 28+2 28+2 33+2 25.0 40.0 46.0 38.0 38.0 45.0 38.0 27.0 28.0 44.0 70.0 69.0 50.0 47.0 58.0 45.0 60.0 55.0 32.0 45.0 32.0 15.0 48.0 43.0 55.0 41.0 32.0 13.0 11.0 6.0 5.0 10.0 6.0 13.0 €165K
18205 246268 D. Walker-Rice 17 https://cdn.sofifa.org/players/4/19/246268.png England https://cdn.sofifa.org/flags/14.png 47 66 Tranmere Rovers https://cdn.sofifa.org/teams/2/light/15048.png €60K €1K 1228 Right 1.0 3.0 2.0 Medium/ Medium Lean No RW 34.0 Apr 24, 2018 NaN 2019 5'10 154lbs 47+2 47+2 47+2 47+2 46+2 46+2 46+2 47+2 45+2 45+2 45+2 46+2 39+2 39+2 39+2 46+2 36+2 32+2 32+2 32+2 36+2 35+2 31+2 31+2 31+2 35+2 44.0 50.0 39.0 42.0 40.0 51.0 34.0 32.0 32.0 52.0 61.0 60.0 52.0 21.0 71.0 64.0 42.0 40.0 48.0 34.0 33.0 22.0 44.0 47.0 50.0 46.0 20.0 25.0 27.0 14.0 6.0 14.0 8.0 9.0 €143K
18206 246269 G. Nugent 16 https://cdn.sofifa.org/players/4/19/246269.png England https://cdn.sofifa.org/flags/14.png 46 66 Tranmere Rovers https://cdn.sofifa.org/teams/2/light/15048.png €60K €1K 1321 Right 1.0 3.0 2.0 Medium/ Medium Lean No CM 33.0 Oct 30, 2018 NaN 2019 5'10 176lbs 43+2 43+2 43+2 45+2 44+2 44+2 44+2 45+2 45+2 45+2 45+2 46+2 45+2 45+2 45+2 46+2 46+2 46+2 46+2 46+2 46+2 46+2 47+2 47+2 47+2 46+2 41.0 34.0 46.0 48.0 30.0 43.0 40.0 34.0 44.0 51.0 57.0 55.0 55.0 51.0 63.0 43.0 62.0 47.0 60.0 32.0 56.0 42.0 34.0 49.0 33.0 43.0 40.0 43.0 50.0 10.0 15.0 9.0 12.0 9.0 €165K

18207 rows × 88 columns

In this section I will dig into the given data, looking for missing values, duplicates or any other data clearning approach.

Chosing columns of interest.

Here I will eliminate all columns I might not use for my analysis.

In [4]:
df.columns
Out[4]:
Index(['ID', 'Name', 'Age', 'Photo', 'Nationality', 'Flag', 'Overall',
       'Potential', 'Club', 'Club Logo', 'Value', 'Wage', 'Special',
       'Preferred Foot', 'International Reputation', 'Weak Foot',
       'Skill Moves', 'Work Rate', 'Body Type', 'Real Face', 'Position',
       'Jersey Number', 'Joined', 'Loaned From', 'Contract Valid Until',
       'Height', 'Weight', 'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW',
       'LAM', 'CAM', 'RAM', 'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM',
       'CDM', 'RDM', 'RWB', 'LB', 'LCB', 'CB', 'RCB', 'RB', 'Crossing',
       'Finishing', 'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling',
       'Curve', 'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration',
       'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower',
       'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression',
       'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure',
       'Marking', 'StandingTackle', 'SlidingTackle', 'GKDiving', 'GKHandling',
       'GKKicking', 'GKPositioning', 'GKReflexes', 'Release Clause'],
      dtype='object')
In [5]:
columnsOfInterest = ['ID', 'Name', 'Age', 'Nationality', 'Overall',
       'Potential', 'Club', 'Value', 'Wage', 'Special',
       'Preferred Foot', 'International Reputation', 'Weak Foot',
       'Skill Moves', 'Work Rate', 'Position','Height', 'Weight', 'Crossing',
       'Finishing', 'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling',
       'Curve', 'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration',
       'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower',
       'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression',
       'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure',
       'Marking', 'StandingTackle', 'SlidingTackle']
In [6]:
#--> Here is our new dataset.
df = df[columnsOfInterest]
df
Out[6]:
ID Name Age Nationality Overall Potential Club Value Wage Special Preferred Foot International Reputation Weak Foot Skill Moves Work Rate Position Height Weight Crossing Finishing HeadingAccuracy ShortPassing Volleys Dribbling Curve FKAccuracy LongPassing BallControl Acceleration SprintSpeed Agility Reactions Balance ShotPower Jumping Stamina Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure Marking StandingTackle SlidingTackle
0 158023 L. Messi 31 Argentina 94 94 FC Barcelona €110.5M €565K 2202 Left 5.0 4.0 4.0 Medium/ Medium RF 5'7 159lbs 84.0 95.0 70.0 90.0 86.0 97.0 93.0 94.0 87.0 96.0 91.0 86.0 91.0 95.0 95.0 85.0 68.0 72.0 59.0 94.0 48.0 22.0 94.0 94.0 75.0 96.0 33.0 28.0 26.0
1 20801 Cristiano Ronaldo 33 Portugal 94 94 Juventus €77M €405K 2228 Right 5.0 4.0 5.0 High/ Low ST 6'2 183lbs 84.0 94.0 89.0 81.0 87.0 88.0 81.0 76.0 77.0 94.0 89.0 91.0 87.0 96.0 70.0 95.0 95.0 88.0 79.0 93.0 63.0 29.0 95.0 82.0 85.0 95.0 28.0 31.0 23.0
2 190871 Neymar Jr 26 Brazil 92 93 Paris Saint-Germain €118.5M €290K 2143 Right 5.0 5.0 5.0 High/ Medium LW 5'9 150lbs 79.0 87.0 62.0 84.0 84.0 96.0 88.0 87.0 78.0 95.0 94.0 90.0 96.0 94.0 84.0 80.0 61.0 81.0 49.0 82.0 56.0 36.0 89.0 87.0 81.0 94.0 27.0 24.0 33.0
3 193080 De Gea 27 Spain 91 93 Manchester United €72M €260K 1471 Right 4.0 3.0 1.0 Medium/ Medium GK 6'4 168lbs 17.0 13.0 21.0 50.0 13.0 18.0 21.0 19.0 51.0 42.0 57.0 58.0 60.0 90.0 43.0 31.0 67.0 43.0 64.0 12.0 38.0 30.0 12.0 68.0 40.0 68.0 15.0 21.0 13.0
4 192985 K. De Bruyne 27 Belgium 91 92 Manchester City €102M €355K 2281 Right 4.0 5.0 4.0 High/ High RCM 5'11 154lbs 93.0 82.0 55.0 92.0 82.0 86.0 85.0 83.0 91.0 91.0 78.0 76.0 79.0 91.0 77.0 91.0 63.0 90.0 75.0 91.0 76.0 61.0 87.0 94.0 79.0 88.0 68.0 58.0 51.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
18202 238813 J. Lundstram 19 England 47 65 Crewe Alexandra €60K €1K 1307 Right 1.0 2.0 2.0 Medium/ Medium CM 5'9 134lbs 34.0 38.0 40.0 49.0 25.0 42.0 30.0 34.0 45.0 43.0 54.0 57.0 60.0 49.0 76.0 43.0 55.0 40.0 47.0 38.0 46.0 46.0 39.0 52.0 43.0 45.0 40.0 48.0 47.0
18203 243165 N. Christoffersson 19 Sweden 47 63 Trelleborgs FF €60K €1K 1098 Right 1.0 2.0 2.0 Medium/ Medium ST 6'3 170lbs 23.0 52.0 52.0 43.0 36.0 39.0 32.0 20.0 25.0 40.0 41.0 39.0 38.0 40.0 52.0 41.0 47.0 43.0 67.0 42.0 47.0 16.0 46.0 33.0 43.0 42.0 22.0 15.0 19.0
18204 241638 B. Worman 16 England 47 67 Cambridge United €60K €1K 1189 Right 1.0 3.0 2.0 Medium/ Medium ST 5'8 148lbs 25.0 40.0 46.0 38.0 38.0 45.0 38.0 27.0 28.0 44.0 70.0 69.0 50.0 47.0 58.0 45.0 60.0 55.0 32.0 45.0 32.0 15.0 48.0 43.0 55.0 41.0 32.0 13.0 11.0
18205 246268 D. Walker-Rice 17 England 47 66 Tranmere Rovers €60K €1K 1228 Right 1.0 3.0 2.0 Medium/ Medium RW 5'10 154lbs 44.0 50.0 39.0 42.0 40.0 51.0 34.0 32.0 32.0 52.0 61.0 60.0 52.0 21.0 71.0 64.0 42.0 40.0 48.0 34.0 33.0 22.0 44.0 47.0 50.0 46.0 20.0 25.0 27.0
18206 246269 G. Nugent 16 England 46 66 Tranmere Rovers €60K €1K 1321 Right 1.0 3.0 2.0 Medium/ Medium CM 5'10 176lbs 41.0 34.0 46.0 48.0 30.0 43.0 40.0 34.0 44.0 51.0 57.0 55.0 55.0 51.0 63.0 43.0 62.0 47.0 60.0 32.0 56.0 42.0 34.0 49.0 33.0 43.0 40.0 43.0 50.0

18207 rows × 47 columns

Checking Missing Values.

In [7]:
df.isnull().sum()
Out[7]:
ID                            0
Name                          0
Age                           0
Nationality                   0
Overall                       0
Potential                     0
Club                        241
Value                         0
Wage                          0
Special                       0
Preferred Foot               48
International Reputation     48
Weak Foot                    48
Skill Moves                  48
Work Rate                    48
Position                     60
Height                       48
Weight                       48
Crossing                     48
Finishing                    48
HeadingAccuracy              48
ShortPassing                 48
Volleys                      48
Dribbling                    48
Curve                        48
FKAccuracy                   48
LongPassing                  48
BallControl                  48
Acceleration                 48
SprintSpeed                  48
Agility                      48
Reactions                    48
Balance                      48
ShotPower                    48
Jumping                      48
Stamina                      48
Strength                     48
LongShots                    48
Aggression                   48
Interceptions                48
Positioning                  48
Vision                       48
Penalties                    48
Composure                    48
Marking                      48
StandingTackle               48
SlidingTackle                48
dtype: int64

since there are a few amount of missing values let's drop them.

In [8]:
df.dropna(axis = 0, inplace = True)
C:\Users\El-NaGGaR\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
In [9]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 17918 entries, 0 to 18206
Data columns (total 47 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   ID                        17918 non-null  int64  
 1   Name                      17918 non-null  object 
 2   Age                       17918 non-null  int64  
 3   Nationality               17918 non-null  object 
 4   Overall                   17918 non-null  int64  
 5   Potential                 17918 non-null  int64  
 6   Club                      17918 non-null  object 
 7   Value                     17918 non-null  object 
 8   Wage                      17918 non-null  object 
 9   Special                   17918 non-null  int64  
 10  Preferred Foot            17918 non-null  object 
 11  International Reputation  17918 non-null  float64
 12  Weak Foot                 17918 non-null  float64
 13  Skill Moves               17918 non-null  float64
 14  Work Rate                 17918 non-null  object 
 15  Position                  17918 non-null  object 
 16  Height                    17918 non-null  object 
 17  Weight                    17918 non-null  object 
 18  Crossing                  17918 non-null  float64
 19  Finishing                 17918 non-null  float64
 20  HeadingAccuracy           17918 non-null  float64
 21  ShortPassing              17918 non-null  float64
 22  Volleys                   17918 non-null  float64
 23  Dribbling                 17918 non-null  float64
 24  Curve                     17918 non-null  float64
 25  FKAccuracy                17918 non-null  float64
 26  LongPassing               17918 non-null  float64
 27  BallControl               17918 non-null  float64
 28  Acceleration              17918 non-null  float64
 29  SprintSpeed               17918 non-null  float64
 30  Agility                   17918 non-null  float64
 31  Reactions                 17918 non-null  float64
 32  Balance                   17918 non-null  float64
 33  ShotPower                 17918 non-null  float64
 34  Jumping                   17918 non-null  float64
 35  Stamina                   17918 non-null  float64
 36  Strength                  17918 non-null  float64
 37  LongShots                 17918 non-null  float64
 38  Aggression                17918 non-null  float64
 39  Interceptions             17918 non-null  float64
 40  Positioning               17918 non-null  float64
 41  Vision                    17918 non-null  float64
 42  Penalties                 17918 non-null  float64
 43  Composure                 17918 non-null  float64
 44  Marking                   17918 non-null  float64
 45  StandingTackle            17918 non-null  float64
 46  SlidingTackle             17918 non-null  float64
dtypes: float64(32), int64(5), object(10)
memory usage: 6.6+ MB

Cool!

Checking for duplicates.

In [10]:
df.duplicated().sum()
Out[10]:
0

No duplicated rows found!.

Here is a crucial point must be considered for further analysis. For any money related column, values are describe in 'K' for thousands and 'M' for millions of pounds the next approach I will convert them into numeric values in order to use them later in our analysis.

Values converter

In [11]:
def valueConverter(dataframe, col, currency = "€"):
    '''Take a dataframe and a column in a string format and return equivalent a numeric one'''
    dataframe[col] = dataframe[col].str.replace(currency, "")
    inKs = dataframe[dataframe[col].str.contains('K')]   #--> taking values described in thousands.
    inMs = dataframe[dataframe[col].str.contains("M")]    #--> taking values describe in millions.
    inKs[col] = inKs[col].str.replace("K", "").astype(float) * 1000
    inMs[col] = inMs[col].str.replace("M", '').astype(float) * 1000000
    return pd.concat([inKs, inMs], ignore_index = True)
In [12]:
#--> Let's for now convert Value and Wage columns.
In [13]:
df = valueConverter(df, 'Wage')
df = valueConverter(df, "Value")
df
C:\Users\El-NaGGaR\anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
C:\Users\El-NaGGaR\anaconda3\lib\site-packages\ipykernel_launcher.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
C:\Users\El-NaGGaR\anaconda3\lib\site-packages\ipykernel_launcher.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
Out[13]:
ID Name Age Nationality Overall Potential Club Value Wage Special Preferred Foot International Reputation Weak Foot Skill Moves Work Rate Position Height Weight Crossing Finishing HeadingAccuracy ShortPassing Volleys Dribbling Curve FKAccuracy LongPassing BallControl Acceleration SprintSpeed Agility Reactions Balance ShotPower Jumping Stamina Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure Marking StandingTackle SlidingTackle
0 135451 Gomes 37 Brazil 77 77 Watford 600000.0 25000.0 1249 Right 2.0 2.0 1.0 Medium/ Medium GK 6'3 209lbs 15.0 15.0 14.0 23.0 13.0 15.0 12.0 13.0 19.0 25.0 50.0 52.0 56.0 71.0 53.0 30.0 76.0 27.0 64.0 14.0 40.0 20.0 12.0 59.0 41.0 59.0 19.0 14.0 15.0
1 14907 A. Bizzarri 40 Argentina 76 76 Foggia 525000.0 2000.0 1198 Right 2.0 3.0 1.0 Medium/ Medium GK 6'2 196lbs 11.0 17.0 10.0 27.0 19.0 18.0 19.0 18.0 26.0 23.0 55.0 45.0 53.0 68.0 51.0 19.0 68.0 31.0 55.0 19.0 40.0 19.0 10.0 49.0 20.0 60.0 11.0 12.0 11.0
2 45595 D. Bonera 37 Italy 75 75 Villarreal CF 900000.0 18000.0 1490 Right 2.0 3.0 2.0 Low/ High CB 6'0 185lbs 58.0 23.0 74.0 60.0 19.0 41.0 38.0 29.0 61.0 51.0 33.0 30.0 34.0 68.0 48.0 60.0 74.0 34.0 77.0 53.0 80.0 79.0 27.0 41.0 49.0 77.0 82.0 78.0 74.0
3 140082 Rafael 36 Brazil 75 75 Cagliari 900000.0 14000.0 1076 Right 2.0 3.0 1.0 Medium/ Medium GK 6'2 176lbs 13.0 11.0 10.0 25.0 10.0 13.0 11.0 12.0 24.0 23.0 43.0 31.0 31.0 70.0 23.0 21.0 63.0 41.0 49.0 14.0 31.0 10.0 12.0 45.0 22.0 60.0 20.0 19.0 17.0
4 114764 Iraizoz 37 Spain 75 75 Girona FC 450000.0 13000.0 1120 Right 2.0 3.0 1.0 Medium/ Medium GK 6'3 196lbs 11.0 12.0 13.0 29.0 16.0 11.0 12.0 13.0 20.0 15.0 32.0 33.0 28.0 73.0 52.0 20.0 47.0 33.0 83.0 14.0 38.0 23.0 16.0 53.0 13.0 56.0 15.0 11.0 12.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17902 239948 S. Reyes 20 Mexico 64 78 Monarcas Morelia 1000000.0 2000.0 1508 Left 1.0 3.0 3.0 High/ Medium LW 5'9 157lbs 67.0 64.0 37.0 55.0 49.0 73.0 48.0 48.0 41.0 67.0 79.0 75.0 67.0 54.0 66.0 67.0 47.0 51.0 52.0 43.0 29.0 19.0 59.0 51.0 58.0 54.0 27.0 28.0 26.0
17903 241266 W. Geubbels 16 France 64 86 AS Monaco 1000000.0 5000.0 1576 Right 1.0 3.0 3.0 Medium/ Low ST 6'1 159lbs 52.0 65.0 56.0 55.0 66.0 62.0 57.0 43.0 47.0 63.0 85.0 84.0 76.0 62.0 60.0 61.0 60.0 53.0 59.0 57.0 36.0 21.0 62.0 56.0 58.0 63.0 27.0 21.0 19.0
17904 229516 Mayoral 21 Spain 64 79 AD Alcorcón 1000000.0 3000.0 1602 Right 1.0 4.0 3.0 Medium/ Medium RM 5'9 154lbs 64.0 61.0 39.0 59.0 39.0 69.0 67.0 63.0 55.0 65.0 87.0 79.0 87.0 45.0 83.0 58.0 44.0 65.0 42.0 52.0 45.0 18.0 48.0 50.0 60.0 56.0 33.0 31.0 34.0
17905 232104 D. James 20 Wales 64 79 Swansea City 1000000.0 5000.0 1567 Left 1.0 2.0 3.0 High/ Medium LW 5'7 168lbs 62.0 58.0 44.0 64.0 50.0 72.0 53.0 44.0 48.0 63.0 86.0 83.0 79.0 44.0 78.0 71.0 53.0 60.0 58.0 40.0 39.0 21.0 53.0 60.0 55.0 56.0 23.0 27.0 31.0
17906 225738 Kuki Zalazar 20 Spain 64 80 Real Valladolid CF 1000000.0 4000.0 1582 Left 1.0 3.0 2.0 Medium/ Medium RW 5'9 161lbs 65.0 59.0 54.0 64.0 47.0 71.0 74.0 70.0 56.0 69.0 65.0 63.0 55.0 52.0 70.0 71.0 61.0 38.0 46.0 65.0 30.0 27.0 62.0 59.0 59.0 45.0 29.0 21.0 14.0

17907 rows × 47 columns

Another cleaning should be made with Height and Weight column to make them in a suitable format for analysis.

In [14]:
df.Height = df.Height.str.replace("'", ".").astype(float) * 30.48 #Formula: multiply the length in feets value by 30.48 to convert it into cms
df.Weight = df.Weight.str.replace('lbs', "").astype(float) #--> Lets leave this is in pounds weight.
In [15]:
df
Out[15]:
ID Name Age Nationality Overall Potential Club Value Wage Special Preferred Foot International Reputation Weak Foot Skill Moves Work Rate Position Height Weight Crossing Finishing HeadingAccuracy ShortPassing Volleys Dribbling Curve FKAccuracy LongPassing BallControl Acceleration SprintSpeed Agility Reactions Balance ShotPower Jumping Stamina Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure Marking StandingTackle SlidingTackle
0 135451 Gomes 37 Brazil 77 77 Watford 600000.0 25000.0 1249 Right 2.0 2.0 1.0 Medium/ Medium GK 192.024 209.0 15.0 15.0 14.0 23.0 13.0 15.0 12.0 13.0 19.0 25.0 50.0 52.0 56.0 71.0 53.0 30.0 76.0 27.0 64.0 14.0 40.0 20.0 12.0 59.0 41.0 59.0 19.0 14.0 15.0
1 14907 A. Bizzarri 40 Argentina 76 76 Foggia 525000.0 2000.0 1198 Right 2.0 3.0 1.0 Medium/ Medium GK 188.976 196.0 11.0 17.0 10.0 27.0 19.0 18.0 19.0 18.0 26.0 23.0 55.0 45.0 53.0 68.0 51.0 19.0 68.0 31.0 55.0 19.0 40.0 19.0 10.0 49.0 20.0 60.0 11.0 12.0 11.0
2 45595 D. Bonera 37 Italy 75 75 Villarreal CF 900000.0 18000.0 1490 Right 2.0 3.0 2.0 Low/ High CB 182.880 185.0 58.0 23.0 74.0 60.0 19.0 41.0 38.0 29.0 61.0 51.0 33.0 30.0 34.0 68.0 48.0 60.0 74.0 34.0 77.0 53.0 80.0 79.0 27.0 41.0 49.0 77.0 82.0 78.0 74.0
3 140082 Rafael 36 Brazil 75 75 Cagliari 900000.0 14000.0 1076 Right 2.0 3.0 1.0 Medium/ Medium GK 188.976 176.0 13.0 11.0 10.0 25.0 10.0 13.0 11.0 12.0 24.0 23.0 43.0 31.0 31.0 70.0 23.0 21.0 63.0 41.0 49.0 14.0 31.0 10.0 12.0 45.0 22.0 60.0 20.0 19.0 17.0
4 114764 Iraizoz 37 Spain 75 75 Girona FC 450000.0 13000.0 1120 Right 2.0 3.0 1.0 Medium/ Medium GK 192.024 196.0 11.0 12.0 13.0 29.0 16.0 11.0 12.0 13.0 20.0 15.0 32.0 33.0 28.0 73.0 52.0 20.0 47.0 33.0 83.0 14.0 38.0 23.0 16.0 53.0 13.0 56.0 15.0 11.0 12.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17902 239948 S. Reyes 20 Mexico 64 78 Monarcas Morelia 1000000.0 2000.0 1508 Left 1.0 3.0 3.0 High/ Medium LW 179.832 157.0 67.0 64.0 37.0 55.0 49.0 73.0 48.0 48.0 41.0 67.0 79.0 75.0 67.0 54.0 66.0 67.0 47.0 51.0 52.0 43.0 29.0 19.0 59.0 51.0 58.0 54.0 27.0 28.0 26.0
17903 241266 W. Geubbels 16 France 64 86 AS Monaco 1000000.0 5000.0 1576 Right 1.0 3.0 3.0 Medium/ Low ST 185.928 159.0 52.0 65.0 56.0 55.0 66.0 62.0 57.0 43.0 47.0 63.0 85.0 84.0 76.0 62.0 60.0 61.0 60.0 53.0 59.0 57.0 36.0 21.0 62.0 56.0 58.0 63.0 27.0 21.0 19.0
17904 229516 Mayoral 21 Spain 64 79 AD Alcorcón 1000000.0 3000.0 1602 Right 1.0 4.0 3.0 Medium/ Medium RM 179.832 154.0 64.0 61.0 39.0 59.0 39.0 69.0 67.0 63.0 55.0 65.0 87.0 79.0 87.0 45.0 83.0 58.0 44.0 65.0 42.0 52.0 45.0 18.0 48.0 50.0 60.0 56.0 33.0 31.0 34.0
17905 232104 D. James 20 Wales 64 79 Swansea City 1000000.0 5000.0 1567 Left 1.0 2.0 3.0 High/ Medium LW 173.736 168.0 62.0 58.0 44.0 64.0 50.0 72.0 53.0 44.0 48.0 63.0 86.0 83.0 79.0 44.0 78.0 71.0 53.0 60.0 58.0 40.0 39.0 21.0 53.0 60.0 55.0 56.0 23.0 27.0 31.0
17906 225738 Kuki Zalazar 20 Spain 64 80 Real Valladolid CF 1000000.0 4000.0 1582 Left 1.0 3.0 2.0 Medium/ Medium RW 179.832 161.0 65.0 59.0 54.0 64.0 47.0 71.0 74.0 70.0 56.0 69.0 65.0 63.0 55.0 52.0 70.0 71.0 61.0 38.0 46.0 65.0 30.0 27.0 62.0 59.0 59.0 45.0 29.0 21.0 14.0

17907 rows × 47 columns

All is set!.

Univariant Data Exploration

in this section my analysis will focus on single variables like distributions and sortations.

let's take an overview how age frequency varies for professional soccer players?

In [16]:
df.Age.describe()
Out[16]:
count    17907.000000
mean        25.095605
std          4.660388
min         16.000000
25%         21.000000
50%         25.000000
75%         28.000000
max         45.000000
Name: Age, dtype: float64
In [17]:
bins = range(15, 46, 5)
plt.hist(data = df, x = 'Age', bins = bins, ec = 'black', color = 'orange');
plt.title("PLayers' Ages Distribution");
plt.xlabel("Age");
plt.ylabel("Frequency")
Out[17]:
Text(0, 0.5, 'Frequency')

Well, most of players have between 20 to 25 yrs.

Say we have a child, and we want to check wheather a soccer player would be a good profession or not, maybe getting an idea about salaries would be beneficial here.

In [18]:
df.Value.describe()
Out[18]:
count    1.790700e+04
mean     2.450133e+06
std      5.633207e+06
min      1.000000e+04
25%      3.250000e+05
50%      7.000000e+05
75%      2.100000e+06
max      1.185000e+08
Name: Value, dtype: float64
In [19]:
bins = np.logspace(3, 8, 30)
plt.hist(data = df, x = "Wage", bins = bins, ec = 'white')
plt.xscale('log')
plt.xlim(1000, 1e6)
xticks = np.logspace(3, 8, 30)
xlabels = ["{:.01f}".format(x) for x in xticks]
plt.xticks( xticks, xlabels, rotation = -90);
plt.title("Wages Distribution");
plt.xlabel("Wages[In Pounds]");
plt.ylabel("Frequency");

Well, maybe this wasn't that insightful because most of players get payed around 1000 or 7200, So that child should be well prepared to excel in order to get paid in millions!

Now let's take a look at height and weight distributions.

In [20]:
df.Weight.describe()
Out[20]:
count    17907.000000
mean       165.964316
std         15.602524
min        110.000000
25%        154.000000
50%        165.000000
75%        176.000000
max        243.000000
Name: Weight, dtype: float64
In [21]:
bins = range(100, 245, 5)
plt.hist(data = df, x = 'Weight', bins = bins, ec = 'black', color = 'orange');
plt.title("Weight Distribution");
plt.xlabel("Weight[lbs]");
plt.ylabel("Frequency");

Most of weights are around 150 and 170 lbs.

In [22]:
df.Height.describe()
Out[22]:
count    17907.000000
mean       176.692613
std         13.666913
min        155.448000
25%        155.752800
50%        179.832000
75%        185.928000
max        210.312000
Name: Height, dtype: float64
In [23]:
bins = range(150, 215, 5)
plt.hist(data = df, x = 'Height', bins = bins, ec = 'black', color = 'orange');
plt.title("Height Distribution");
plt.xlabel("Height[cms]");
plt.ylabel("Frequency");

we got over 4,000 player with height around 160cms, on the other hand a lot of players have heights of 185 to 190 cms

Actually I like top 10s, lets find some top 10s based on players and clubs!.

Top 10 overall perfroming players.

based on overall scores!

In [24]:
top10players = df[['Name', 'Overall']].sort_values(['Overall'], ascending = False)[:10]
sb.barplot(data = top10players, y = 'Name', x = "Overall");
plt.xlim(90, 95);
plt.title('Top 10 Performing Players');

Well, It looks like Messi and Ronaldo followed by Naymar have best performances

In [25]:
top10valued = top10players = df[['Name', 'Value']].sort_values(['Value'], ascending = False)[:10]
sb.barplot(data = top10valued, y = "Name", x = "Value");
plt.xlim(.5e8, 1.3e8);
plt.title("TOP 10 VALUED PLAYERS")
Out[25]:
Text(0.5, 1.0, 'TOP 10 VALUED PLAYERS')

Is that why Naymar always changes his hair color!

Based on club overall.

In [72]:
top10clubs = df[['Club', "Overall"]].groupby("Club").mean().sort_values('Overall',ascending = False)[:10].reset_index()

sb.barplot(data = top10clubs, y = 'Club', x = 'Overall');
plt.xlim(70, 85);
plt.title("TOP 10 PERFORMING CLUBS");

Ohh, italian clubs invaded this rank!

Top paying clubs

In [27]:
top10paying = df[['Club', "Wage"]].groupby("Club").mean().sort_values("Wage", ascending = False).reset_index()[:10]
plt.bar(top10paying.Club, top10paying.Wage)
plt.xticks(rotation = -90);
plt.title('TOP 10 PAYING CLUBS');

Well here might be some outliers so let's leave this for now!.

In this section I will dedicate my attention to finding relationship between different variable.s

Let's begin by checking relationship between overall performance and different abilities with a huge Pair plot!.

In [28]:
df.columns
Out[28]:
Index(['ID', 'Name', 'Age', 'Nationality', 'Overall', 'Potential', 'Club',
       'Value', 'Wage', 'Special', 'Preferred Foot',
       'International Reputation', 'Weak Foot', 'Skill Moves', 'Work Rate',
       'Position', 'Height', 'Weight', 'Crossing', 'Finishing',
       'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling', 'Curve',
       'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration',
       'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower',
       'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression',
       'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure',
       'Marking', 'StandingTackle', 'SlidingTackle'],
      dtype='object')
In [29]:
vars_ = ['Overall', 'Crossing', 'Finishing',
       'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling', 'Curve',
       'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration',
       'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower',
       'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression',
       'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure',
       'Marking', 'StandingTackle', 'SlidingTackle']
In [30]:
len(vars_)
Out[30]:
30
In [31]:
fig, axes = plt.subplots(nrows= 10, ncols= 3, figsize = (20, 60))
vars_ = np.array(vars_).reshape(10, 3)
for i in range(10):
    for j in range(3):
        axes[i][j].scatter(data = df,y =  "Overall",x= vars_[i][j], alpha = .1)
        axes[i][j].set_xlabel(vars_[i][j])
        axes[i][j].set_ylabel("Overall")

From the plot above we could notice a strong positive relations between (Reactions, Composure) and overall performance, and moderate positive relations between (Short Passing, Long Passing, Ball control, Shot power, Vision) and overall performance.

In [32]:
plt.figure(figsize = (20, 20))
sb.heatmap(df[['Overall', 'Crossing', 'Finishing',
       'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling', 'Curve',
       'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration',
       'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower',
       'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression',
       'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure',
       'Marking', 'StandingTackle', 'SlidingTackle']].corr(), annot= True);
plt.title("A graph shows how pairs of soccer abilities affect each other")
Out[32]:
Text(0.5, 1, 'A graph shows how pairs of soccer abilities affect each other')

With a close look at the above heatmap we could determine the relationship between each pair variables as well.

What about the same approach for same variables vs Value!

In [33]:
fig, axes = plt.subplots(nrows= 10, ncols= 3, figsize = (20, 60))
vars_ = np.array(vars_).reshape(10, 3)
for i in range(10):
    for j in range(3):
        axes[i][j].scatter(data = df,y =  "Value",x= vars_[i][j], alpha = .1)
        axes[i][j].set_xlabel(vars_[i][j])
        axes[i][j].set_ylabel("Value")
        axes[i][j].set_yscale("log")

a really strong relation between (Overall, Composure, Reactions, BallControl, Finishing) and Value, and weak to moderate relations vs. the other!.

Let's now check relation between Overall vs. Height then Overall vs. Height

In [34]:
plt.figure(figsize = (20, 8))
plt.figure
plt.subplot(1, 2, 1)
plt.hist2d(data = df, x = 'Height', y = 'Overall');
plt.title("Effect of Height on the Overall Performance");
plt.colorbar(label = "Density")
plt.subplot(1, 2, 2)
plt.hist2d(data = df, x = 'Weight', y = 'Overall');
plt.colorbar(label = 'Density');
plt.title("Effect of Weight on the Overall Performance");
In [35]:
sb.heatmap(df[["Overall", "Height", "Weight"]].corr(), annot= True);
plt.title("Correlation Coefficients Between Heights, Weights, And overall")
Out[35]:
Text(0.5, 1, 'Correlation Coefficients Between Heights, Weights, And overall')

From the couple heatmaps we could notice that relation between Overall vs. weight and Overall vs. Height is a bit weak!. But a noticable population of players with average Height of 185cm have 60 ~ 70 performance rate, and another population got 160 ~ 180 lbs with the same performance rate!

now let's take a subset of the top 10 performing clubs and study them further more!.

In [36]:
top10clubs
Out[36]:
Club Overall
0 Juventus 82.280000
1 Napoli 80.000000
2 Inter 79.750000
3 Real Madrid 78.242424
4 Milan 78.074074
5 FC Barcelona 78.030303
6 Paris Saint-Germain 77.433333
7 Roma 77.423077
8 Manchester United 77.242424
9 FC Bayern München 77.000000
In [37]:
top10clubs.Club.values
Out[37]:
array(['Juventus', 'Napoli', 'Inter', 'Real Madrid', 'Milan',
       'FC Barcelona', 'Paris Saint-Germain', 'Roma', 'Manchester United',
       'FC Bayern München'], dtype=object)
In [38]:
mask = [True if x in top10clubs.values else False for x in df.Club]
subDF = df[mask]
subDF
Out[38]:
ID Name Age Nationality Overall Potential Club Value Wage Special Preferred Foot International Reputation Weak Foot Skill Moves Work Rate Position Height Weight Crossing Finishing HeadingAccuracy ShortPassing Volleys Dribbling Curve FKAccuracy LongPassing BallControl Acceleration SprintSpeed Agility Reactions Balance ShotPower Jumping Stamina Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure Marking StandingTackle SlidingTackle
1361 219715 R. Di Gennaro 24 Italy 67 70 Inter 675000.0 11000.0 1068 Right 1.0 2.0 1.0 Medium/ Medium GK 188.976 176.0 11.0 14.0 15.0 26.0 19.0 18.0 12.0 17.0 18.0 25.0 40.0 46.0 36.0 64.0 41.0 22.0 60.0 39.0 50.0 11.0 21.0 22.0 12.0 41.0 25.0 67.0 4.0 19.0 16.0
2521 244363 Daniel Fuzato 20 Brazil 66 76 Roma 800000.0 7000.0 940 Right 1.0 3.0 1.0 Medium/ Medium GK 192.024 194.0 15.0 17.0 21.0 26.0 8.0 15.0 9.0 9.0 28.0 12.0 31.0 20.0 32.0 64.0 20.0 21.0 37.0 22.0 55.0 13.0 17.0 12.0 9.0 52.0 21.0 48.0 8.0 12.0 11.0
2828 201179 A. Donnarumma 27 Italy 66 67 Milan 500000.0 15000.0 936 Right 1.0 2.0 1.0 Medium/ Medium GK 195.072 212.0 13.0 10.0 10.0 11.0 11.0 19.0 16.0 12.0 11.0 23.0 13.0 22.0 30.0 62.0 35.0 22.0 54.0 19.0 75.0 13.0 17.0 10.0 10.0 33.0 21.0 63.0 18.0 19.0 11.0
3298 246608 Fidalgo 21 Spain 65 75 Real Madrid 875000.0 20000.0 1673 Right 1.0 3.0 3.0 High/ Medium CM 179.832 150.0 61.0 49.0 44.0 69.0 46.0 68.0 62.0 53.0 65.0 70.0 65.0 63.0 68.0 62.0 77.0 54.0 51.0 64.0 49.0 48.0 62.0 52.0 59.0 70.0 49.0 68.0 46.0 51.0 47.0
3773 237522 Jorge Cuenca 18 Spain 65 78 FC Barcelona 850000.0 11000.0 1506 Left 1.0 3.0 2.0 Medium/ Medium CB 188.976 168.0 40.0 27.0 67.0 65.0 31.0 29.0 29.0 26.0 58.0 55.0 57.0 61.0 59.0 65.0 64.0 55.0 69.0 66.0 60.0 36.0 67.0 66.0 21.0 45.0 44.0 57.0 62.0 66.0 63.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17860 241467 F. Feuillassier 20 Argentina 65 80 Real Madrid 1200000.0 23000.0 1574 Right 1.0 4.0 3.0 Medium/ Medium LW 176.784 152.0 64.0 72.0 49.0 62.0 55.0 70.0 44.0 45.0 45.0 65.0 68.0 68.0 67.0 53.0 78.0 72.0 47.0 62.0 61.0 71.0 45.0 22.0 58.0 56.0 50.0 54.0 35.0 23.0 20.0
17872 240513 E. Hamilton 19 Scotland 65 78 Manchester United 1000000.0 11000.0 1734 Left 1.0 3.0 3.0 Medium/ High CM 185.928 165.0 57.0 58.0 64.0 69.0 52.0 61.0 54.0 50.0 64.0 65.0 60.0 55.0 62.0 61.0 60.0 70.0 61.0 66.0 68.0 68.0 70.0 59.0 59.0 63.0 51.0 54.0 57.0 54.0 53.0
17873 237698 C. Gribbin 19 England 65 80 Manchester United 1200000.0 11000.0 1669 Left 1.0 3.0 3.0 Medium/ Medium CAM 182.880 159.0 62.0 54.0 54.0 66.0 44.0 69.0 64.0 72.0 63.0 67.0 70.0 68.0 62.0 60.0 66.0 65.0 54.0 65.0 60.0 62.0 50.0 42.0 62.0 64.0 56.0 58.0 22.0 35.0 40.0
17878 241810 Chumi 19 Spain 65 82 FC Barcelona 1000000.0 11000.0 1523 Right 1.0 3.0 2.0 Medium/ Medium CB 185.928 172.0 38.0 30.0 60.0 62.0 33.0 50.0 34.0 36.0 51.0 60.0 57.0 61.0 49.0 60.0 61.0 47.0 74.0 62.0 67.0 36.0 55.0 62.0 34.0 41.0 50.0 62.0 70.0 65.0 63.0
17885 243627 Y. Adli 17 France 65 84 Paris Saint-Germain 1100000.0 7000.0 1721 Right 1.0 3.0 3.0 High/ Medium CM 185.928 172.0 62.0 46.0 60.0 67.0 50.0 74.0 60.0 50.0 65.0 72.0 62.0 61.0 65.0 56.0 57.0 67.0 59.0 59.0 63.0 53.0 53.0 51.0 59.0 65.0 55.0 65.0 59.0 57.0 55.0

285 rows × 47 columns

Let's how different features vary among these clubs

In [39]:
sb.pointplot(data = subDF, x = 'Club', y= 'Overall', order= top10clubs.Club);
plt.xticks(rotation = -90);
plt.title("TOP 10 PERFORMING CLUBS")
Out[39]:
Text(0.5, 1.0, 'TOP 10 PERFORMING CLUBS')

We obtained this before let's dig deeper.

In [40]:
sb.pointplot(data = subDF, x = 'Club', y= 'Potential', order= top10clubs.Club);
plt.xticks(rotation = -90);
plt.title("POTENTIAL LEVEL OF TOP 10 PERFORMING CLUBS");

Here Barcelona is a strong competitor to Juventus when it comes to Potential rate!

Let's see which club pays better!, and here outliers won't distract us !

In [41]:
sb.boxplot(data = subDF, x = 'Club', y= 'Wage', order= top10clubs.Club);
plt.xticks(rotation = -90)
jev_med = subDF[subDF.Club == "Juventus"].Wage.median()
plt.axhline(y = jev_med, color = 'red');
plt.title("WAGES 5 NUMBERS SUMMARY FOR TOP 10 PERFORMING CLUBS");

Jeventus, Bercelona,and Real Madried on median thay pay closer wages!

Let's check how their heights and weights vary!

In [42]:
sb.boxplot(data = subDF, x = 'Club', y= 'Weight', order= top10clubs.Club);
plt.xticks(rotation = -90);
plt.title("HOW WEIGHTS CHANGE FOR TOP 10 PERFORMING CLUBS")
Out[42]:
Text(0.5, 1.0, 'HOW WEIGHTS CHANGE FOR TOP 10 PERFORMING CLUBS')

Players whose play in Bercelona got a harsh diet!

In [43]:
sb.boxplot(data = subDF, x = 'Club', y= 'Height', order= top10clubs.Club);
plt.xticks(rotation = -90);
plt.title("HOW HEIGHTS CHANGE FOR TOP 10 PERFORMING CLUBS")
Out[43]:
Text(0.5, 1.0, 'HOW HEIGHTS CHANGE FOR TOP 10 PERFORMING CLUBS')

Players in Milan have a wide range of heights!

Here is a funny one!, say we are about to play penalties game, which team has more possibility to win this game!

In [44]:
sb.violinplot(data = subDF, x = 'Club', y= 'Penalties', order= top10clubs.Club);
jev_med = subDF[subDF.Club == "Juventus"].Penalties.median()
plt.axhline(y = jev_med, color = 'red');
plt.xticks(rotation = -90);
plt.title("WHICH CLUB WOULD WIN A PENALTIES GAME");

Actually they're all have closer median values but Juventus is the higher, then Bayern and Barcelona!

Multivariant Data Exploration

In this section I will study relationship between two variables or more!

In [45]:
df
Out[45]:
ID Name Age Nationality Overall Potential Club Value Wage Special Preferred Foot International Reputation Weak Foot Skill Moves Work Rate Position Height Weight Crossing Finishing HeadingAccuracy ShortPassing Volleys Dribbling Curve FKAccuracy LongPassing BallControl Acceleration SprintSpeed Agility Reactions Balance ShotPower Jumping Stamina Strength LongShots Aggression Interceptions Positioning Vision Penalties Composure Marking StandingTackle SlidingTackle
0 135451 Gomes 37 Brazil 77 77 Watford 600000.0 25000.0 1249 Right 2.0 2.0 1.0 Medium/ Medium GK 192.024 209.0 15.0 15.0 14.0 23.0 13.0 15.0 12.0 13.0 19.0 25.0 50.0 52.0 56.0 71.0 53.0 30.0 76.0 27.0 64.0 14.0 40.0 20.0 12.0 59.0 41.0 59.0 19.0 14.0 15.0
1 14907 A. Bizzarri 40 Argentina 76 76 Foggia 525000.0 2000.0 1198 Right 2.0 3.0 1.0 Medium/ Medium GK 188.976 196.0 11.0 17.0 10.0 27.0 19.0 18.0 19.0 18.0 26.0 23.0 55.0 45.0 53.0 68.0 51.0 19.0 68.0 31.0 55.0 19.0 40.0 19.0 10.0 49.0 20.0 60.0 11.0 12.0 11.0
2 45595 D. Bonera 37 Italy 75 75 Villarreal CF 900000.0 18000.0 1490 Right 2.0 3.0 2.0 Low/ High CB 182.880 185.0 58.0 23.0 74.0 60.0 19.0 41.0 38.0 29.0 61.0 51.0 33.0 30.0 34.0 68.0 48.0 60.0 74.0 34.0 77.0 53.0 80.0 79.0 27.0 41.0 49.0 77.0 82.0 78.0 74.0
3 140082 Rafael 36 Brazil 75 75 Cagliari 900000.0 14000.0 1076 Right 2.0 3.0 1.0 Medium/ Medium GK 188.976 176.0 13.0 11.0 10.0 25.0 10.0 13.0 11.0 12.0 24.0 23.0 43.0 31.0 31.0 70.0 23.0 21.0 63.0 41.0 49.0 14.0 31.0 10.0 12.0 45.0 22.0 60.0 20.0 19.0 17.0
4 114764 Iraizoz 37 Spain 75 75 Girona FC 450000.0 13000.0 1120 Right 2.0 3.0 1.0 Medium/ Medium GK 192.024 196.0 11.0 12.0 13.0 29.0 16.0 11.0 12.0 13.0 20.0 15.0 32.0 33.0 28.0 73.0 52.0 20.0 47.0 33.0 83.0 14.0 38.0 23.0 16.0 53.0 13.0 56.0 15.0 11.0 12.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17902 239948 S. Reyes 20 Mexico 64 78 Monarcas Morelia 1000000.0 2000.0 1508 Left 1.0 3.0 3.0 High/ Medium LW 179.832 157.0 67.0 64.0 37.0 55.0 49.0 73.0 48.0 48.0 41.0 67.0 79.0 75.0 67.0 54.0 66.0 67.0 47.0 51.0 52.0 43.0 29.0 19.0 59.0 51.0 58.0 54.0 27.0 28.0 26.0
17903 241266 W. Geubbels 16 France 64 86 AS Monaco 1000000.0 5000.0 1576 Right 1.0 3.0 3.0 Medium/ Low ST 185.928 159.0 52.0 65.0 56.0 55.0 66.0 62.0 57.0 43.0 47.0 63.0 85.0 84.0 76.0 62.0 60.0 61.0 60.0 53.0 59.0 57.0 36.0 21.0 62.0 56.0 58.0 63.0 27.0 21.0 19.0
17904 229516 Mayoral 21 Spain 64 79 AD Alcorcón 1000000.0 3000.0 1602 Right 1.0 4.0 3.0 Medium/ Medium RM 179.832 154.0 64.0 61.0 39.0 59.0 39.0 69.0 67.0 63.0 55.0 65.0 87.0 79.0 87.0 45.0 83.0 58.0 44.0 65.0 42.0 52.0 45.0 18.0 48.0 50.0 60.0 56.0 33.0 31.0 34.0
17905 232104 D. James 20 Wales 64 79 Swansea City 1000000.0 5000.0 1567 Left 1.0 2.0 3.0 High/ Medium LW 173.736 168.0 62.0 58.0 44.0 64.0 50.0 72.0 53.0 44.0 48.0 63.0 86.0 83.0 79.0 44.0 78.0 71.0 53.0 60.0 58.0 40.0 39.0 21.0 53.0 60.0 55.0 56.0 23.0 27.0 31.0
17906 225738 Kuki Zalazar 20 Spain 64 80 Real Valladolid CF 1000000.0 4000.0 1582 Left 1.0 3.0 2.0 Medium/ Medium RW 179.832 161.0 65.0 59.0 54.0 64.0 47.0 71.0 74.0 70.0 56.0 69.0 65.0 63.0 55.0 52.0 70.0 71.0 61.0 38.0 46.0 65.0 30.0 27.0 62.0 59.0 59.0 45.0 29.0 21.0 14.0

17907 rows × 47 columns

Let's check the relationship between Position and Overall with respect to preferred Preferred Foot!

In [46]:
plt.figure(figsize =(20, 8))
sb.pointplot(data = df, x = "Position", y = 'Overall', hue = "Preferred Foot", dodge = .2, linestyles = "");
plt.title("Overall Performance Change with Position with respect to Preferred Foot")
Out[46]:
Text(0.5, 1.0, 'Overall Performance Change with Position with respect to Preferred Foot')

Well, left legged players at Left Forward position has higher overall values!

Let's check the same for Shot Power

In [47]:
plt.figure(figsize =(20, 8))
sb.pointplot(data = df, x = "Position", y = 'ShotPower', hue = "Preferred Foot", dodge = .2, linestyles = "");
plt.title("Shooting Power Change with Position with respect to Preferred Foot");

Forward players shoot the best weather they left legged or right legged!

Let's check how Overall changes with respect to preferred and position!

In [48]:
plt.figure(figsize = (15, 10))
cat_means = df.groupby(['Preferred Foot', 'Position']).mean()['Overall']
cat_means = cat_means.reset_index(name = 'Overall')
cat_means = cat_means.pivot(index = 'Position', columns = 'Preferred Foot',
                            values = 'Overall');
sb.heatmap(cat_means, annot = True, fmt = '.3f',
           cbar_kws = {'label' : 'mean(Overall)'});
plt.title("How Overall performance is ditributed among different positions with different footendess")
Out[48]:
Text(0.5, 1, 'How Overall performance is ditributed among different positions with different footendess')

From here we could determine which player should play in different positions depending on Overall rate! For example for LAM we should put a Left legged player but in RWB we should put a right legged one! despite small differences.

In [49]:
ir = pd.api.types.CategoricalDtype([5, 4, 3, 2, 1], ordered= True)
df["International Reputation"] = df["International Reputation"].astype(ir)
ir = pd.api.types.CategoricalDtype([1, 2, 3, 4, 5], ordered= True)
df["Skill Moves"] = df["Skill Moves"].astype(ir)

Lets see relationship among International Reputation, Skill Moves, and Overall

In [50]:
plt.figure(figsize = (15, 10))
cat_means = df.groupby(['Skill Moves', 'International Reputation']).mean()['Overall']
cat_means = cat_means.reset_index(name = 'Overall')
cat_means = cat_means.pivot(index = 'International Reputation', columns = 'Skill Moves',
                            values = 'Overall')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
           cbar_kws = {'label' : 'mean(Overall)'});
plt.title("How International Reputation, Skill Moves, and Overall affect each other!")
Out[50]:
Text(0.5, 1, 'How International Reputation, Skill Moves, and Overall affect each other!')

Here we could notice how International Reputation increases with the increase of skill moves and Overall rate

In [51]:
grid2 = sb.FacetGrid(data = df, hue = 'Skill Moves');
grid2.map(sb.regplot, 'Overall',  "Value", x_jitter = .5, fit_reg = False, scatter_kws = {"alpha":.4})
figure = plt.gcf()
figure.set_size_inches(12, 8);
# plt.xlim(70, 100)
plt.yscale('log')
plt.legend(title = 'Skill Moves Rate');
plt.title('How increasing Overall rate affect on how the player is valued!');

Players with 5 or 4 skill moves rate have high Overall and Value which is intuitive!

Let's check how height and weight affect on the player's Stamina

In [52]:
plt.figure(figsize = (12, 8))
plt.scatter(data = df, x = 'Weight', y = 'Height', c = 'Stamina');
plt.colorbar(label = 'Stamina Rate');
plt.title("How Weight and Height affect the player's Stamina");
plt.xlabel('Weight[lbs]')
plt.ylabel("Height[cms]");

Well it's obvious that Stamina crucially decrease with increase in height and weight!

Finally let's check relation between Heading Accuracy, Height, Jumping

In [53]:
df.columns
Out[53]:
Index(['ID', 'Name', 'Age', 'Nationality', 'Overall', 'Potential', 'Club',
       'Value', 'Wage', 'Special', 'Preferred Foot',
       'International Reputation', 'Weak Foot', 'Skill Moves', 'Work Rate',
       'Position', 'Height', 'Weight', 'Crossing', 'Finishing',
       'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling', 'Curve',
       'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration',
       'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower',
       'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression',
       'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure',
       'Marking', 'StandingTackle', 'SlidingTackle'],
      dtype='object')
In [54]:
plt.scatter(data = df, y = 'HeadingAccuracy', x = 'Jumping',c = 'Height', cmap= 'viridis_r');
plt.colorbar(label = 'Hight[cms]');
plt.title("Effect of Jumping on Haeding Accuracy with respect to the player's height");
plt.xlabel("Jumping Rate")
plt.ylabel("Heading Accuracy Rate");

Ohh, it looks like there is no clear pattern between them!

This section will cover relations and insights obtained from the previous exploratory step, Most of graphs here are same as above but polished and organized to be easily intrepreted!.

In [55]:
plt.figure(figsize = (12 , 8))
bins = range(15, 46, 5)
n, bins, patches = plt.hist(data = df, x = 'Age', bins = bins, ec = 'black');
patches[1].set_color("#ef4f4f");
patches[1].set_edgecolor("black");
plt.title("Players' Age Distribution");
plt.xlabel("Age");
plt.ylabel("Frequency");

Well, most of players have between 20 to 25 yrs.

In [56]:
plt.figure(figsize = (12 , 8));
bins = range(100, 245, 5)
n, bins, patches = plt.hist(data = df, x = 'Weight', bins = bins, ec = 'black');
plt.title("Players' Weights Distribution");
plt.xlabel("Weight[lbs]");
plt.ylabel("Frequency");

Well it seems like weights are normally distributed binomially around 150lbs and 175lbs

In [57]:
plt.figure(figsize = (12 , 8))
bins = range(150, 215, 5)
plt.hist(data = df, x = 'Height', bins = bins, ec = 'black');
plt.title("Players' Weights Distribution");
plt.xlabel("Height[cms]");
plt.ylabel("Frequency");

we got over 4,000 player with height around 160cms, on the other hand a lot of players have heights of 185 to 190 cms

In [58]:
plt.figure(figsize = (12 , 8))
top10players = df[['Name', 'Overall']].sort_values(['Overall'], ascending = False)[:10]
sb.barplot(data = top10players, y = 'Name', x = "Overall", color = sb.color_palette()[0]);
plt.xlim(85, 95);
plt.title("TOP 10 PLAYERS IN FIFA 19");
plt.xlabel("Overall Performance Rate");
plt.ylabel("Player's Name");

Ronaldo and Messi come first, followed by Naymar Jr.!

In [60]:
plt.figure(figsize = (12 , 8))
top10valued = top10players = df[['Name', 'Value']].sort_values(['Value'], ascending = False)[:10]
sb.barplot(data = top10valued, y = "Name", x = "Value", color= sb.color_palette()[0]);
plt.xlim(.5e8, 1.3e8)
xticks = np.arange(50e6, 150e6, 10e6)
xlabels = ['{:.0f}'.format(x) for x in xticks]
plt.xticks(xticks, xlabels)
plt.title("TOP 10 PAIED PLAYERS IN FIFA 19");
plt.xlabel("Overall Performance Rate");
plt.ylabel("Player's Name");
plt.grid()

Unsurprisingly, Naymar gets the highest amount,then Messi and De Bruyne

In [87]:
plt.figure(figsize = (12 , 8))
top10clubs = df[['Club', "Overall"]].groupby("Club").mean().sort_values('Overall',ascending = False)[:10].reset_index()
sb.barplot(data = top10clubs, y = 'Club', x = 'Overall', color= sb.color_palette()[0]);
plt.xlim(70, 85);
plt.title("TOP 10 CLUBS IN FIFA 19");
plt.xlabel("Overall Performance Rate");
plt.ylabel("Club");

5 Italian clubs are in the top 10, Juventus comes first then Napoli and Inter

In [95]:
plt.figure(figsize= (20, 12))
plt.subplot(1,3,1)
plt.scatter(data = df, x = 'Reactions', y = 'Overall')
plt.title("Reactions Vs Overll Performance", fontsize = 14, weight = "bold")
plt.xlabel("Reactions Rate", fontsize = 14, weight = "bold")
plt.ylabel("Overall Performance", fontsize = 14, weight = "bold");
#--
plt.subplot(1,3,2)
plt.scatter(data = df, x = 'Composure', y = 'Overall')
plt.title("Composure Vs Overll Performance", fontsize = 14, weight = "bold")
plt.xlabel("Composure Rate", fontsize = 14, weight = "bold")
plt.ylabel("Overall Performance", fontsize = 14, weight = "bold");
#--
plt.subplot(1,3,3)
plt.scatter(data = df, x = 'Vision', y = 'Overall')
plt.title("Vision Vs Overll Performance", fontsize = 14, weight = "bold")
plt.xlabel("Vision Rate", fontsize = 14, weight = "bold");
plt.ylabel("Overall Performance", fontsize = 14, weight = "bold");
plt.suptitle('Relation Between Soccer abilities and Overall Performance', fontsize = 16, weight = "bold");

Reactions, Composure, Vision

In [96]:
plt.figure(figsize = (12, 8))
sb.boxplot(data = subDF, y = 'Club', x= 'Wage', order= top10clubs.Club, color = sb.color_palette()[0]);
jev_med = subDF[subDF.Club == "Juventus"].Wage.median()
plt.axvline(x = jev_med, color = 'red');
plt.title("5 Numbers summery for the top 10 paying clubs", fontsize = 14, weight = "bold");
plt.ylabel("Club", fontsize = 14, weight = "bold");
plt.xlabel("Amount[pounds]", fontsize = 14, weight = "bold");

Well Juventus comes first, then Barcelona and Real Madrid and Roma comes the last!

In [97]:
plt.figure(figsize = (20, 20))
plt.title("Correlations among all soccer abilities")
sb.heatmap(df[['Overall', 'Crossing', 'Finishing',
       'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling', 'Curve',
       'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration',
       'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower',
       'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression',
       'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure',
       'Marking', 'StandingTackle', 'SlidingTackle']].corr(), annot= True, cmap= "viridis_r");
In [98]:
plt.figure(figsize =(20, 8))
sb.pointplot(data = df, x = "Position", y = 'Overall', hue = "Preferred Foot", dodge = .2, linestyles = "");
plt.title("Position vs. Overall Performance", fontsize = 14, weight = "bold");
plt.xlabel("Position", fontsize = 14, weight = "bold");
plt.ylabel("Overall Performance", fontsize = 14, weight = "bold");

Left legged players at Left Forward positions have the higher performance values!

In [100]:
plt.figure(figsize =(20, 8))
sb.pointplot(data = df, x = "Position", y = 'ShotPower', hue = "Preferred Foot", dodge = .2, linestyles = "")
plt.title("Position vs. Shooting Power", fontsize = 14, weight = "bold");
plt.xlabel("Position", fontsize = 14, weight = "bold");
plt.ylabel("Shootin Power", fontsize = 14, weight = "bold");

Players that play Forward have high shooting power then whose in the Middle but footedness doesn't affect that much!

In [101]:
plt.figure(figsize = (15, 10))
cat_means = df.groupby(['Skill Moves', 'International Reputation']).mean()['Overall']
cat_means = cat_means.reset_index(name = 'Overall')
cat_means = cat_means.pivot(index = 'International Reputation', columns = 'Skill Moves',
                            values = 'Overall')
sb.heatmap(cat_means, annot = True, fmt = '.3f',
           cbar_kws = {'label' : 'mean(Overall)'});
plt.title("Skill Moves vs. International Reputation vs. Overall Performance", fontsize = 14, weight = "bold");

Yes they strongly are!

In [112]:
plt.figure(figsize = (15, 10))
plt.scatter(data = df, x = 'Weight', y = 'Height', c = 'Stamina')
plt.colorbar(label = 'Stamina Rate');
plt.title("Weight vs. Height vs. Stamina", fontsize = 14, weight = "bold");
plt.xlabel("Height[cms]");
plt.ylabel("Weights[lbs]");

We could Notice that low stamina rates are observed at high weights and heights!